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ABSTRACT 


Path-based network diffusion kernel is its tenability, that it can consider 
different number of shortest paths in kernel computation. This resembles, 
vaguely, a Taylor-expansion of network topology to form a diffusion kernel 
with different orders of expansion. One can extend this key idea to design 
other network diffusion kernels to approximate other general diffusion 
models such as SIR the challenge of determining sources is compounded as the 
true propagation dynamics are typically unknown, and when they have been 
directly measured, they rarely conform to the assumptions of any of the well- 
studied models. In this paper introduce a method called Network Infusion (Nl) 
that has been designed to circumvent these issues, making source inference 
practical for large, complex real world networks. A stateless receiver-based 
multicast (RBMulticast) protocol that simply uses a list of the multicast 
members’ (e.g., sinks’) addresses, embedded in packet headers, to enable 
receivers to decide the best way to forward the multicast traffic. This protocol, 
called Receiver-Based Multicast, exploits the knowledge of the geographic 
locations of the nodes to remove the need for costly state maintenance. The 
key idea is that to infer the source node in the network, full characterization of 
diffusion dynamics, in many cases, may not be necessary. This objective is 
achieved by creating a diffusion kernel that well-approximates standard 
diffusion models such as the susceptible-infected diffusion model, but lends 
itself to inversion, by design, via likelihood maximization or error 
minimization. We apply NI for both single-source and multi-source diffusion, 
for both single-snapshot and multi-snapshot observations, and for both 
homogeneous and heterogeneous diffusion setups. We prove the mean-field 
optimality of NI for different scenarios, and demonstrate its effectiveness over 
several synthetic networks. Moreover, we apply NI to a real-data application, 
identifying news sources in the Digg social network, and demonstrate the 
effectiveness of NI compared to existing methods. 
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INTRODUCTION 

Information from a single node (entity) can reach other 
nodes (entities) by propagation over network connections. 
For instance, a viral infection (either computer or biological) 
can propagate to different nodes in a network and become 
an epidemic, while rumors can spread in a social network 
through social interactions. Even a financial failure of an 
institution can have cascading effects on other financial 
entities and may lead to a financial crisis. As a final example, 
in some human diseases, abnormal activities of few encoding 
genes for example, transcription factors, can cause their 
target genes and therefore some essential biological 
processes to fail to operate normally in the cell. In order to 
gain insight into these processes, mathematical models have 
been developed, primarily focusing on application to the 
study of virus propagation in networks. 

A well-established continuous-time diffusion model for viral 
epidemics is known as the susceptible-infected (SI) model, 
where infected nodes spread the virus to their neighbors 
probabilistically. For that diffusion model, explore the 


relationship between network structure, infection rate, and 
the size of epidemics, while considering learning SI model 
parameters. Other diffusion methods use random walks to 
model information spread and label propagation in 
networks. These references study the forward problem of 
signal diffusion. Source inference is the inverse problem. It 
aims to infer source nodes in a network by merely knowing 
the network structure and observing the information spread 
at single or multiple snapshots (Figure 1). Even within the 
context of the well-studied diffusion kernels, source 
inference is a difficult problem in great part owing to the 
presence of path multiplicity in the network. 

Network Infusion (NI) aims to identify source node(s) by 
reversing information propagation in the network. NI is 
based on a path-based network diffusion process that closely 
approximates the observed diffusion pattern, while leading 
to a tractable source inference method for large complex 
networks. The displayed network and infection pattern are 
parts of the Digg social news network. Recently, the inverse 
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problem of a diffusion process in a network under a discrete 
time memory less diffusion model, and when time steps are 
known, has been considered, while the problem of 
identifying seed nodes (effectors] of a partially activated 
network in the steady state of an Independent-Cascade 
model is investigated in. Moreover reference has considered 
the source inference problem using incomplete diffusion 
traces by maximizing the likelihood of the trace under the 
learned model. The problem setup and diffusion models 
considered in those works are different than the continuous¬ 
time diffusion setup considered in the present paper. The 
work in uses the Minimum Description Length principle to 
identify source nodes using a heuristic cost function which 
combines the model cost and the data cost. Moreover, for the 
case of having a single source node in the network, some 
methods infer the source node based on distance centrality, 
or degree centrality measures of the infected sub graph. 
These methods can be efficiently applied to large networks, 
but, amongst other drawbacks, their performance lacks 
provable guarantees in general. For tree structures under a 
homogeneous SI diffusion model, computes a maximum 
likelihood solution for the source inference problem and 
provides provable guarantees for its performance. Over tree 
structures, their solution is equivalent to the distance 
centrality of the infected subgraph. The problem of inferring 
multiple sources in the network has an additional 
combinatorial complexity compared to the single-source 
case. 

Reference has considered this problem under an 
Independent Cascade (IC) model and introduced a 
polynomial time algorithm based on dynamic-programming 
for some special cases. Source inference in real-world 
networks is made more challenging as the true propagation 
dynamics are typically unknown, and when they have been 
directly measured, they rarely conform to the assumptions 
of well-studied canonical diffusion models such as SI, 
partially owing to heterogeneous information diffusion over 
network edges, latent sources of information, noisy network 
connections, non-memory less transport and so forth. As an 
example, we consider news spread over the Digg social news 
network for more than 3,500 news stories. We find that in 
approximately 65% of cases, nodes who have received the 
news at a time t, did not have any neighbors who had already 
received the news by that time violating the most basic 
conditional independence assumption of the SI model. 
Furthermore, the empirical distribution of remaining news 
propagation times over edges of the Digg social news 
network cannot be approximated closely by a single 
distribution of a homogenous SI diffusion model, even by 
fitting a general Weibull distribution to the observed data. 

Owing to high computational complexity of solving the 
source inference problem under the well-studied SI diffusion 
models and considering the fact that those kernels are 
unlikely to match precisely a real-world diffusion, our key 
idea to solve the inverse problem is to identify a diffusion 
process that closely approximates the observed diffusion 
pattern, but also leads to a tractable source inference 
method by design. Thus, we develop a diffusion kernel that is 
distinct from the standard SI diffusion models, but its order 
of diffusion well approximates many of them in various 
setups. We shall show that this kernel leads to an efficient 
source inference method that can be computed efficiently for 
large complex networks and shall provide theoretical 


performance guarantees under general conditions. The key 
original observation, from both a theoretical and practical 
perspective, is that in order to solve the inverse problem one 
does not need to know the full dynamics of the diffusion, 
instead to solve the inversion one can do so from statistics 
that are consistent across many diffusion models. 

LITERATURE SURVEY 

1. Statistical Inference of Computer Virus Propagation 

Using Non-Homogeneous Poisson Processes by H. 

Okamura, K. Tateishi, and T. Dohi, 

This paper presents measurable surmising of PC infection 
spread utilizing non-homogeneous Poisson forms [NHPPs], 
Under some numerical suppositions, the quantity of tainted 
hosts can be displayed by a NHPP. Specifically, this paper 
applies a structure of blended sort NHPPs to the measurable 
deduction of intermittent infection spread. The blended sort 
NHPP is characterized by a superposition of NHPPs. In 
numerical analyses, we look at a decency of-fit measure of 
NHPPs on fitting to genuine infection disease information, 
and talk about the adequacy of the model-based expectation 
approach for PC infection spread, we have built up the 
factual models to portray the PC infection proliferation 
dependent on NHPPs. Specifically, when we apply the 
calculated and extraordinary esteem appropriations to the 
disease time dissemination, the subsequent mean conduct of 
NHPP models are actually the same as the outstanding 
strategic and Gompertz bends. Along these lines the 
structure of NHPP models basically contains the traditional 
relapse investigation. Additionally, we have presented the 
blended kind NHPP models to speak to the proliferation of 
PC infection. Since the blended kind NHPP models can 
express occasional disease wonder, it is better than the 
typical non-blended NHPPs with unimodal contamination 
time conveyance, as far as decency of-fit. For the factual 
examination, we have proposed the EM calculation with the 
goal that we could undoubtedly gauge show parameters for 
infection disease information. In numerical analyses, we 
have performed KS trial of the NHPP models for 116 sorts of 
infection information. Thus, all infection disease would be 
demonstrated by NHPPs. Additionally, we analyzed the 
expectation capacities for the proposed blended sort NHPP 
models, contrasted with the ordinary relapse models with 
the strategic and Gompertz bends. And we have explored 
that the blended sort NHPP models were fit for fitting to any 
sort of disease information in numerical examinations, the 
forecast capacity is inadequate to assess the future infection 
contamination notwithstanding when we utilize the blended 
kind NHPP. 

2. Spotting Culprits in Epidemics: How many and 

Which ones? by B. A. Prakash, J. Vreeken, and C. 

Faloutsos, 

Given a depiction of a substantial chart, in which a 
contamination has been spreading for quite a while, would 
we be able to recognize those hubs from which the disease 
begun to spread? At the ending of the day, can we 
dependably tell who the offenders are? Here, we answer this 
inquiry certifiably, and give a proficient technique called 
NETSLEUTH for the outstanding Susceptible-Infected 
infection proliferation display. Basically, we are after that 
arrangement of seed hubs that best clarify the given 
depiction. We propose to utilize the least Description Length 
guideline to distinguish the best arrangement of seed hubs 
and infection engendering swell, as the one by which we can 
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most compactly portray the contaminated diagram. We give 
an exceptionally productive calculation to distinguish likely 
arrangements of seed hubs given a preview. At that point, 
given these seed hubs, we demonstrate we can streamline 
the infection spread swell principledly by augmenting 
probability. With every one of the three consolidated, 
N ETSLEUTH can consequently recognize the right number of 
seed hubs, and additionally which hubs are the guilty parties. 
Experimentation on our strategy demonstrates high 
exactness in the discovery of seed hubs, notwithstanding the 
right programmed recognizable proof of their number. In 
addition, we indicate NETSLEUTH scales straightly in the 
quantity of hubs of the chart, we examined discovering guilty 
parties, the testing issue of distinguishing the hubs from 
which a contamination in a diagram began to spread. We 
proposed to utilize the Minimum Description Length 
standard for distinguishing that arrangement of seed hubs 
from which the given depiction can be portrayed generally 
concisely. We presented NETSLEUTH (in view of a novel 
'submatrix-laplacian' strategy], a very proficient calculation 
for both distinguishing the arrangement of seed hubs that 
best portrays the given circumstance, and consequently 
choosing the best number of seed hubs—rather than the best 
in class. 

3. The Effect of Network Topology on the Spread of 

Epidemics by A. Ganesh, L. Massouli e, and D. 

Towsley 

Numerous framework ponders are all around shown as 
spreads of pandemics through a framework. Perceptible 
points of reference consolidate the spread of worms and 
email diseases, and, even more all things considered, 
weaknesses. Various sorts of information dispersal can 
similarly be shown as spreads of pandemics. In this paper we 
address the subject of what makes a pandemic either slight 
or serious, More precisely, we perceive topological 
properties of the chart that choose the creativity of 
sicknesses. In particular, we exhibit that if the extent of fix to 
pollution rates is greater than the unearthly scope of the 
graph, by then the mean pandemic lifetime is of demand 
Iogn, where 7] is the amount of centers. Then again, if this 
extent is tinier than a theory of the isoperimetric predictable 
of the graph, by then the mean pandemic lifetime is of 
demand, for a positive relentless a. We apply these results to 
a couple of framework topologies including the hypercube, 
which is an agent accessibility graph for a spread hash table, 
the whole outline, which h a basic system graph for BGP, and 
the power law graph, of which the AS-level Inkmet diagram 
is a prime model. 

We likewise ponder the star topology and the Erd6s-Rhyi 
chart as their plague spreading practices decide the 
spreading conduct of intensity law diagrams. We have 
displayed a primer examination of how topology influences 
the spread of a pestilence, roused by systems administration 
wonders, for example, worms and infections, falling 
disappointments, and dispersal of data. We have created 
adequate conditions under which pestilences either cease to 
exist rapidly (logarithmically in the span of the system] or 
gradually (exponentially in the measure of the system], 

4. Explosive Percolation in Random Networks by D. 

Achlioptas, R. M. D’Souza, and J. Spencer 

Systems in which the development of associations is 
represented by an arbitrary procedure regularly experience 


a permeation change, wherein around a basic point, the 
expansion of few associations makes a sizable portion of the 
system all of a sudden wind up connected together. 
Ordinarily such advances are nonstop, with the goal that the 
level of the system connected together will in general zero 
appropriate over the progress point. Regardless of whether 
permeation advances could be spasmodic has been an open 
inquiry. Here, we demonstrate that fusing a restricted 
measure of decision in the great Erdos-Renyi arrange 
development show causes its permeation change to end up 
spasmodic. Other than water transforming into ice or steam, 
other prototypical stage advances are the unconstrained 
development of charge and superconductivity in metals, the 
pestilence spread of malady, and the sensational change in 
availability of systems and cross sections known as 
permeation. Maybe the most key normal for a stage progress 
is its request, i.e., regardless of whether the plainly visible 
amount it influences changes persistently or spasmodically 
at the change. Persistent (smooth] changes are called 
second-arrange and incorporate numerous charge marvels, 
while spasmodic (unexpected] advances are called first- 
arrange, a commonplace model being the intermittent drop 
in entropy when fluid water transforms into strong ice at 
0°C. 

5. A tutorial introduction to Bayesian inference for 
stochastic epidemic models using Markov chain 
Monte Carlo methods by P. D. ONeill, 

Later Bayesian techniques for the investigation of irresistible 
ailment flare-up information utilizing stochastic plague 
models are surveyed. These strategies depend on Markov 
chain Monte Carlo techniques. Both fleeting and 
nontemporal information are considered. The strategies are 
outlined with various precedents highlighting diverse 
models and datasets. Contingent upon the application being 
referred to, models may join dormant periods, variable 
infectivity, diminished defenselessness following 
recuperation, and so forth. Also, parts of populace 
heterogeneity, for example, age structure, shifting 
helplessness, differential blending rates between gatherings 
of people, and so forth can be incorporated as suitable. 
Similarly as with any factual displaying, there is a harmony 
between models that are excessively convoluted for the 
information, making it impossible to completely illuminate, 
and those which are excessively shortsighted, making it 
impossible to be viewed truly as a reason for producing 
helpful induction. By and by it isn't constantly clear to 
accomplish this parity by means of a formal method; issues 
of model sufficiency and integrity of-fit are not particularly 
all around created in the writing. Circumstances in which the 
information basically comprise of rehashed free perceptions. 

6. Mixing patterns in networks by M. E. Newman 

We contemplate assortative blending in systems, the 
propensity for vertices in systems to be associated with 
different vertices that resemble ~or not at all like! them here 
and there. We consider blending as per discrete qualities, for 
example, dialect or race in informal organizations and scalar 
attributes, for example, age. As an uncommon case of the last 
we consider blending as per vertex degree, i.e., as per the 
quantity of associations vertices need to different vertices: 
do gregarious individuals will in general connect with 
different gregarious individuals? We propose various 
proportions of assortative blending suitable to the different 
blending types, and apply them to an assortment of 
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certifiable systems, demonstrating that assortative blending 
is an unavoidable wonder found in numerous systems. We 
additionally propose a few models of assortatively blended 
systems, both explanatory ones dependent on producing 
capacity strategies, and numerical ones dependent on Monte 
Carlo diagram age techniques. We utilize these models to 
test the properties of systems as their dimension of 
assortativity is shifted. In the specific instance of blending by 
degree, we find solid variety with assortativity in the 
availability of the system and in the flexibility of the system 
to the evacuation of vertices. Assortative blending can 
profoundly affect the basic properties of a system. For 
instance, assortative blending of a system by a discrete 
trademark will in general split the system up into 
independent networks. In the event that individuals want to 
be companions with other people who talk their very own 
dialect, for instance, at that point one may expect nations 
with in excess of one dialect to isolate into networks by 
dialect. Assortative blending by age could cause stratification 
of social orders along age lines. 
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